SOME ASPECTS IN CLASSIFICATION OF 
REMOTELY SENSED IMAGES USING ERROR 
BACKPROPAGATION ARTIFICIAL NEURAL 

NETWORK 


By 

Kumar Ashssh Tiwari 



DEPARTMENT OF CIVIL ENGINEERING 

Indian Institute of Technology Kanpur 

APRIL, 2002 



Some Aspects in Classification of Remotely Sensed 
Images using Error Back-propagation 
Artificial Neural Network 


A. thesis submitted 

in partial fulfillment of the requirements 
for the degree of 


Master of Technology 


by 

Kumar Ashish Tiwari 



to the 

Department of Civil Engineering 
INDIAN INSTITUTE OF TECHNOLOGY KANPUR 


April 2002 



CERTIFICATE 




THESIS SUBMITTED 
nw 4- •?*“* *C * 

, a 




7/ z> certified that the work presented in this thesis titled "Some Aspects in Classification of 
Remotely Sensed Images using Error Back-propagation Artificial Neural Network " has 
been carried out by Kumar Ashish Tiwari (KN. Y0 10323) under my supervision and has 
not been submitted elsewhere for a degree. 


Date f i / u-" Onkar Dikshit 

Assoc. Professor 
Department of Civil Engineering 

IIT Kanpur 



4 FEB 2003’ ,( '£ 

'■{ HTsifytft vR«fR sginjp 


v« i •' , t 


A-' 


141877 




Acknowledgment 


I express my deep sense of gratitude to my thesis supervisor, Dr. Onkar Dikshit 
for his insightful teaching of the subject matter. I also thank him for various 
creative suggestions about the content and organization of my thesis. 

I also thank all my teachers at IITK for imparting me very best of education. I 
also thank 1 Yirendra Pathak and S.KKatiyar , Ph.D. scholars with the 
Geoinformatics division, for their continuous encouragement, suggestions and 
support. 

I express thanks to my batch mates KM.Reddy, Vibhor Jain , Harveen Punihani , 
Deepak R. Mishra and Jadunandan Dash , with whom I had great time studying 
various courses. I also thank my wing mates Vinod Pathari, Prithvi P. Singh Bisht , 
Pijush Ghosh and Jayadeep U.B. for making my stay very pleasant and enjoyable. 
Thanks are due to my seniors and juniors also for giving me a memorable stay. 

I also thank Mr. Igor Fisher {University of Tubingen, Germany)', who is one the 
members of JavaNNS development team, for his help on JavaNNS software 
during the initial stages through e-mail. 

Last but not the least, I want to thank Mr. G.P. Mishra for his affection, love 
and caring throughout my stay. Thanks arc due to Ramkishen also. 


Date ! 1 ~4 -2002- 
Kanpur 


Kumar Ashish Tiwari 



Contents 


List of Tables 
Ust of / fares 
Ust of Plates 
/ lbs tract 


1 Introduction 

1.1 Background of the work 

1.2 Problem definition 

1.3 Objectives and scope of the work 

1.4 Overview of data used and study area 

1.5 Software and hardware details 

1.6 Organization of work 

2 Literature Review 

2.0 Introduction 

2.1 Classification or Pattern recognition 

2.2 Gaussian maximum likelihood classification 

2.3 Texture 

2.4 Artificial neuron 

2.4.1 The biological neuron 

2.4.2 The Mathematical equivalent model 

2.5 Artificial neural network 

2.6 Hidden layer size assessment 

2.7 Optimal sample size assessment 

2.8 Accuracy assessment 

2.9 Work by previous researchers 



3 Theory of back-propagation ANN 30 

3.0 Introduction 30 

3.1 Supervised and un supervised learning 30 

3.2 Neural network learning rules 32 

3.3 ANN as a classifier 33 

3.3.1 Multi layer network 33 

3.3.2 Momentum method 34 

3.3.3 Generalized delta learning rule 35 

3.3.4 Furor back propagation training algorithm 37 

4 Methodology 39 

4.0 Introduction 39 

4. 1 Sample set selection 39 

4.2 Classifications 40 

4.2.1 Classification using Gaussian maximum likelihood classifier 41 

4.2.2 Classification using back-propagation ANN 

with only spectral values 42 

4.2.2. 1 Classification with back-propagation ANN, 

while using varying number of hidden nodes 42 

4.2. 2. 2 Classification with back-propagation ANN, 

while using training sets of different sizes 42 

4.2.3 Classification with back-propagation ANN 

while using combined spectral and texture features 43 

4.3 Classification of images using JavaNNS 44 

5 Results and Discussion 49 

5.0 Introduction 49 

5.1 GML Classification 49 

5.2 Effect of number of hidden nodes on classification accuracy 51 

5.3 Generalization property of the neural network 57 

5.4 Effect of training set size on accuracy 58 

5.5 Iiffect of using texture 60 



6 Conclusion 

6. 1 Conclusions 

6.2 Recommendations for future wor 

R e fen rices 

.Appendix rl (Sample set characteristics) 
Appendix B (last of C programs) 
Appendix C (Plates) 



I Jst of 'Tables 


1.1 Satellite data characteristics for study areas. 8 

2. 1 Summary of various data encoding and network structure 28 

2.2 Summary of the work done by recent investigators 

in the field of back-propagation ANN. 29 

4.1 Classes in the study sites 39 

5.1 Kappa values obtained for GML classification 49 

5.2 Kappa and X statistics values for / jteknow area 

when using varying number of nodes 53 

5.3 Kappa and Z-statistics values for Bhopal area 

when using varying number of nodes 54 

5.4 Summary of results for varying hidden node ANN classification 55 

5.5 Comparison of GMLC and ANN for generalization ability 58 

5.6 Testing set kappa values when using different sized training sets 59 

5.7 Comparison of conventional (as///) and window texture methods 

for laicknow case study 61 

5.8 Comparison of conventional (asm) and window texture methods 

for Bhopal case study 62 

A .1 Sample set characteristics ffjucknow): trainSO 
A ,2 Sample set characteristics ( \jicknow ): test79 
A .3 Sample set characteristics (Bhopal): trainSO 
A .4 Sample set characteristics (Bhopal): test88 



Ust oj Figures 


1.1 Locations of the study sites 6 

2.1 The biological neuron 16 

2.2 A corresponding mathematical model of neuron i 17 

2.3 (a) Bipolar and (b) unipolar activation functions. 18 

2.4 A layered feed forward network with two layers 19 

3.1 Block diagrams for (a) supervised and (b) unsupervised learning 31 

3.2 Error back propagation training algorithm flow chart 38 

4.1 Implementation' of texture using window method (3x3 window example) 43 

4.2 Mow chart of classifying images using JaraNNS 48 

5.1 ( I ML accuracies of group / classes 50 

5.2 C I ML accuracies of group! classes 5 1 

5.3 ANN accuracies for \ jtcknow site (selected classes) 56 

5.4 ANN accuracies for Bhopal site (selected classes) 56 

5.5 ANN accuracies for different training set sizes (selected classes). 59 

5.6 Texture accuracies for "Lucknow site using conventional and window 

approaches (selected classes) 64 

5.7 Texture accuracies for Bhopal site using conventional and window 

approaches (selected classes) 64 



IJst of Plates 


Plate 1.1 FCC of \ jickmnv area generated using NIR, red and green bands. 

Plate 1.2 FCC of Bhopal area generated using NIR, red and green bands. 

Plate Cl . 1 GML classified image of Ljicknow, 

Plate C l. 2 ANN classified image of l-jtcknow using 10 hidden nodes. 

Plate Cl. 3 ANN classified image of \jtckmm* using trai?i4() sample set. 

Plate Cl. 4 GML classified image of \ jichunv using asm (7 x 7) texture approach. 

Plate Cl. 5 ANN classified image of I^itcknoiv using window (7 x 7) based texture approach. 

Plate C2.1 GML classified image of Bhopal 

Plate C2.2 ANN classified image of Bhopal using 10 hidden nodes 

Plate C2.3 ANN classified image of Bhopal using traiti4() sample set. 

Plate ( >2.4 GML classified image of Bhopal using asm (7 x 7) texture approach 

Plate C2.5 ANN classified image of Bhopal using window (7 x 7) based text ure approach. 



Abstract 


Neural network can be successfully used to classify remotely sensed images. 
Issues like network size, training sample set size and textural information 
incorporation and their effect on neural network classification accuracy have 
been investigated in the current study. Results show that there is no significant 
effect of network size on classification accuracy, although a minimum 
complexity network is required for classification. A minimal training sample set 
comprising of 10// pixels/class (where n is the number of features) can perform 
similar to bigger samples. Neural network may give better classification 
accuracy for spectrally varying classes as compared to (JML classifier. Texture 
information incorporation is done using conventional and window-based 
approaches. Window based approach is classifying spectrally varying classes in 
a better manner as compared to conventional approach. This may be because 

t 

of better neighborhoodjinformation incorporation by window-based approach. 



1 


Introduction 


1.1 Background of the work 


The remote sensing satellites Landsat, SPOT, KRS, IRS etc. are producing huge 
amount of data that can be used for various scientific and technological 
advancements. While such data gives us an opportunity to address many 
fundamental environmental issues in more depth, it also opens up new 
challenges for data processing and data interpretation. Traditional parametric 
statistical approaches to supervised classification like Gaussian maximum 
likelihood classifier, and minimum distance-to-means, have been successfully 
used in the past for classifying remotely sensed imagery. These approaches 
have certain limitations. They depend on the assumption of a multivariate 
Gaussian (normal) distribution for the data to be classified. Each class in 
feature space is assumed to have an n- dimensional (where n is the number of 
input wavebands) multivariate Gaussian distribution. 

The problem with the statistical approach is that the data in the feature space 
may not follow the assumed model (Atkinson and Tatnall, 1997). Further, it is 
possible that a single class may be represented at multiple places in the feature 
space. This is particularly the case when classes are heterogeneous in nature. 
Consequently, statistical approaches may be seen as restrictive because of the 
underlying assumption of the model. A further problem with statistical 
approaches is that they require non-singular (invertible) class specific 
covariance matrices. 


I 




One of the main advantages of neural networks for classification is that they 
are distribution free, that is, no underlying model is assumed for the 
multivariate distribution of the class specific data in feature space. It is, 
therefore, possible for a single class to be represented in feature space as a 
series of clusters in place of a single cluster. Thus, a fundamental difference 
between statistical and neural approaches to classification is, that statistical 
approaches depend on an assumed mode! whereas neural approaches depend on 
data. It is this reason that neural networks are suitable for integrating data from 
different sources. It is in the above context of these requirements that artificial 
neural networks are currently being applied in a wide variety of remote sensing 
applications. The use of artificial neural network for remote sensing data 
interpretation has been motivated by the realization that the human brain is 
very efficient at processing vast quantities of data from a variety of different 
sources. Neurons in the human brain receive inputs from other neurons to 
produce an output, which is then passed to other neurons. A mathematical 
model based on the actions of the biological neuron may be implemented to 
process and interpret different type of data. In this sense, neural networks are 
an artificial intelligence approach. Neural networks in the simplest sense can be 
seen as data transformers (Pao, 1989), where the objective is to associate the 
elements in one set of data with the elements in the second set. When applied 
to classification, they are concerned with the transformation of data from 
feature space to class space. Thus, neural networks belong to the same class of 
techniques as automated pattern recognition. The rapid increase in use of 
neural networks is due to their ability to (Atkinson and Tatnall, 1997): 


2 



(i) perform more accurately than other techniques such as statistical 
classifiers, particularly when the feature space is complex and the 
source data has different statistical distributions; 

(ii) perform more rapidly than other techniques such as statistical 
classifiers; 

(iii) incorporate a priori knowledge and realistic physical constraints into 
the analysis; 

(iv) incorporate different types of data (including those from different 
sensors) into the analysis, thus facilitating synergistic studies. 

1.2 Problem definition 

Although, neural networks have been used to classify data accurately, there are 
problems in their use. One basic issue is the desired size or complexity of the 
network (Foody and Arora, 1997). This is effectively a function of the number 
of weighted connections in the network, which in turn is a function of the 
number of network layers and the number of units in each of these units. As 
opposed to a simple network, a complex network is more able to achieve an 
accurate characterization of the training pixels, but have a lower capacity to 
generalize and correctly classify testing pixels. Other issues like number of 
training iterations may also have an effect on the learning and generalization 
ability of the network. Since the number of input and output units are 
effectively fixed by design, the problem in defining architecture relates to the 
nature of hidden layer(s). Specifically, the researcher has to find out how many 
hidden layers and units to use in defining the network architecture. 


3 



I’hc characteristics of the data used to train a supervised classification have a 
considerable influence on the quality of the resulting classification (Campbell, 
1981). It is essential that the training data provide a representative description 
of each class. A requirement is that the size of training set be at least 10 to 30 
times the number of bands (Mather, 1987; Piper, 1992). The required training 
size may therefore be large. The number could further rise because of other 
issues (ey. the effect of background soil on the spectral response of crop). 
Thus, acquiring such training sets is difficult for classification involving a large 
number of classes. Consequently, many investigations have been based on 
sample sizes below the generally accepted guidelines and thus may not have 
fully exploited the information content of the remotely sensed data. The lack of 
distributional assumption makes artificial neural network an attractive 
alternative to the conventional statistical classification schemes. It has been 
proposed that artificial neural network classification may be performed on 
smaller even minimal training sets (Hepner et al., 1990, Foody et a/., 1995). 

Neighborhood information can be incorporated in form of context and texture. 
When humans visually interpret remotely sensed imagery, they synergist ically 
take into account, in addition to spectral value, context, edges and texture. 
Conversely, most digital image processing classification algorithms are based 
only on the use of spectral (tonal) information. Thus, it is not very surprising 
that there have been efforts to incorporate some of these other characteristics 
into digital classification procedure. A discrete tonal feature is a connected set 
of pixels that have the same or almost the same gray shade. When a small area 
of image (e.g., a 3x3 area) has little variation of discrete tonal features, the 
dominant property of that area is a gray shade (Jensen, 1996). Conversely, when 
a small area has a wide variation of discrete tonal features, the dominant 


4 



property of that area is texture. Most researchers trying to incorporate texture 
into the classification process have attempted to create a new texture image that 
can then be used as another feature or band in the classification. Thus, each 
new pixel of the texture image has a brightness value that represents the texture 
at that location. Another alternative (Bischof et a/., 1992) is presenting the 
network with the entire window of pixels normally used to calculate a local 
texture value. A allows the network to infer an arbitrary texture as necessary to 
help separate the classes. In the current study, the second (presenting the 
net work with window of pixels) approach has been investigated. 

1.3 Objectives and scope of the work 

The following arc the specific objectives of the present study 

error 

1. To compare the classification performance of^back-propagation artificial 

neural network and Gaussian maximum likelihood classifiers; 

error 

2. To evaluate the performance of^ back-propagation artificial neural 
network classification while using, 

(a) different number of hidden nodes; 

(b) different training set size; 

error 

3. To evaluate the performance oi^back-propagation ANN using combined 
neighborhood (texture) and spectral information. 

1.4 Overview of data used and study area 

In this work, data from two different sites have been taken for the purpose of 
analysis. They are cities of Lucknow and Bhopal . , the capitals of two north Indian 


5 



states, and adjoining areas. The location of these cities is shown in the map of 
India in Figure 1.1. 



Figure 1.1 Locations of the study sites 

The details of the data products for study sites are given in Table 1.1. The data 
used for Ijtcknoiv site is IRS 1C, LISS III data comprising of four multi spectral 
bands taken in Nov 1996. The area covered is predominantly urban and 
agricultural in nature. It is covered between Latitudes 26°49' N and 26°57' N 
and Longitudes 80°55' E and 81°01' E. River Gomati passes through the middle 
left portion of the region. The region to the south of the river is dominantly 
urban while towards north agriculture and medium urban region is present. In 
the northern region a reserved forest ( Kukre l) is clearly identifiable. The data 
used for Bhopal is IRS 1R, LISS II data comprising of four multi spectral bands. 
The image was taken in Jan 1995. The area is located between Latitudes 23° 12' 
N and 23°17'N and Longitudes 77°lt'E and 77°25'E. The area is urban and a 
lot of vegetation is also present. In the central part the big 'Bhopal lake can be 
identified. '1'lie false color composites of study sites using N1R, red and green 
bands are shown in Plates 1.1 and 1.2. 


6 




Plate 1.1 I'alse color composite (FCC) of l-Jtcknoiv area generated using NIR, 
red and green bands. 



Plate 1.2 False color composite (FCC) of Bhopal area generated using NIR, red 
and green bands. 


7 



Table 1.1 Satellite data characteristics for study areas. 


Satellite 

/Sensor 

Bands 

Resolution 

(m) 

Size 

(pixels) 

Wavelength 
(p m) 

Spectral 

Region 

Path/Row 

Lucknow 

irs k; 

LISS III 

Band 1 
Band 2 

23.5 

512 x 512 

0.52 0.59 

Green 

100/052 

23.5 

512x 512 

0.62-0.68 

Red 

Band 3 

23.5 

512 x 512 

0.77-0.86 

NIR 

Band 4 

70.5* 

512x512 

1.55-1.70 

SWIR 

Bhopal 

IRS IB 

I.ISS II 

Band 1 

36.5 

512x 512 

0.45-0.52 

Blue 

51/28 

Band 2 

36.5 

512x 512 

0.52-0.59 

Green 

Band 3 

36.5 

512x512 

0.62-0.68 

Red 

Band 4 

36.5 

512x512 

0.77-0.86 

NIR 


* resampled to 23.5 m. 


1.5 Software and hardware details. 


Intensive use of existing software has been done. The main names and 

their uses are given below: 

1. ILWIS (Integrated Land and Water Information System, ITC 
Netherlands): for the making of false color composite images and 
sample set selection and visualization of classified images. 

2. JavaNNS (java Neural Network Simulator): To make several networks, 
perform training operations on them and for classifying imagery by the 
trained network. 

3. Microsoft Visual C++ 6.0: For creating and running various programs 
in 


8 




The standard hardware configuration of the PC (known as GTS 1) include Cyrix 
M II /IBM 6x86 MX -233 CPU, DAKWOO 518X color monitor, 20 GB 

H. D.D. and 160.0 MB RAM. 

I. 6 Organization of work 

The thesis has been organized in 6 chapters. The current chapter provides a 
brief introduction to the problem, data sources and software and hardware 
details. Chapter 2 presents literature review, which comprises of introduction to 
basic concepts in classification, texture, neural network and a survey of the 
work done by previous investigators in the field. Chapter 3 presents theoretical 
background for the error back-propagation training in neural networks. 
Chapter 4 describes methodology for classifying images using neural network. 
Chapter 5 presents results obtained from the study and discussion of the 
results, while chapter 6 gives a summary of conclusions. 


9 



2 


Literature Review 


2.0 Introduction 


In this chapter, concepts related to classification, Gaussian maximum- 
likelihood (GMI,) classification and texture are introduced. They are followed 
by a brief description about biological neuron and its equivalent mathematical 
representation, the artificial neuron. After this a simple feed forward network 
has been described. They are followed by assessment of optimum hidden layer 
size and optimum sample set size for the network. The next section describes 
accuracy assessment and various statistics calculation for any classification. 
Lastly, work by recent investigators in the field of back-propagation artificial 
neural network has been examined. 

2.1 Classification or Pattern recognition 

Remotely sensed data can be used to extract useful information about the area 
to which the data belongs, which is thematic in nature. This process of 
converting data into useful information, which is of fundamental importance in 
remote sensing, is called Pattern Recognition or simply Classification. Before we go 
forward we can have a look at the terms pattern and class. In multi spectral 
classification (which means we are using remotely sensed data that has been 
obtained using more than one band of electromagnetic radiation), a set of 
values for a single pixel on each of a number of spectral bands may be referred 
to as a pattern. 'The variables that define the basis of such a measurement (e.g. 
values in different bands) are called features. Thus, we can simply say that a 


10 


pattern is a measurement of values over a chosen set of features. When we talk 
about class, we are generally referring to information class as against spectral 
class. Information class refers to actual earth surface cover types while spectral 
class is a grouping of pixels of similar spectral response. In remote sensing 
applications we are more concerned about information classes because it 
provides information that can be directly used. The way to achieve such a 
classification is through supervised classification. 

In a supervised classification, the identity and location of some of the land 
cover types, such as urban, water and agriculture, are known a priori, through 
several means other than classification, like field work, aerial surveys etc. 
Specific sites can be located corresponding to those cover classes on the image. 
These sites are called training sites. Once these training sites corresponding to 
different classes have been located their properties (spectral values) can be used 
to classify the whole image into those classes. This process of assigning a 
predetermined label on each of the pixels in the image is known as supervised 
classification. 'There are several methods to do this. The choice of particular 
classifier or decision process depends upon the nature of input data and the 
desired output. Parametric classification algorithms assume that the 
measurement vectors corresponding to any training class are Gaussian in nature 
i.e. they follow normal distribution, which is a statistical assumption. The 
classifiers designed on this assumption are sometimes called as statistical 
classifiers also. The ones that come under this category are the traditional 
classification schemes like parallelepiped, minimum-distance-to-means and 
maximum likelihood classifiers. 


11 



I lowever, the assumption of normality may not be valid in several of practical 
applications, in which classes may be skewed, have a double peak (in cases of 
heterogeneous classes) etc. and in such cases the use of parametric classifiers is 
essentially flawed because the assumption for classification schemes are not 
valid. A recent advancement in the field of supervised classification is using 
Artificial Neural Network, which is a non-parametric approach. A non- 
parametric approach means that it does not have any inherent statistical 
assumption. 

2.2 Gaussian maximum likelihood classifier 

The maximum likelihood classification rule assigns each pixel having pattern 
measurements or features X to the class c whose units are most probable or 
likely to have given rise to feature vector (fensen, 1996) It assumes that the 
training data statistics for each class in each band are normally distributed. 1 his 
classification makes use of the mean measurement vector M c for each class and 
the covariance matrix of class t\ IX. The decision rule applied to the unknown 
measurement vector X is 

Decide Xis in class c, iff 

p t > />„ where / = 1 , 2, 3, . . . all possible classes (2. 1 ) 

and 

p e = {-0.5 log c [det( V c )]}- [0.5( X —M C ) T V C ~'(X - M c )] (2.2) 

where det {V} is the determinant of the covariance matrix V c and X is the 
measurement vector to be classified. 


12 



2.3 Texture 


lexture is generally understood as neighborhood property, although it is 
difficult to define texture. At a simple level texture can be thought of as the 
variability in tone within a neighborhood, or the pattern of spatial relationships 
among the gray levels of neighboring pixels (Shih and Schowengerdt, 1983) and 
is sometimes described in terms such as "rough" or "smooth". Qualitatively 
texture is defined by various terms like coarseness, contrast, directionality, likeliness, 
regularity and roughness (l amura et al. , 1978). Coarseness depends upon the pattern 
and size of texture elements. Contrast depends upon the dynamic gray level, 
polarization of the distribution of black and white on a gray level histogram. 
Directionality depends upon the shape of the elements and placement rules. 
Ij. ke lines s refers to the shape of the primitives of texture and supplements 
coarseness , contrast and directionality. Regularity relates to the variation in placement 
rules of the texture elements. Roughness is a mental concept related to real world 
3-D objects. 

In addition to these, texture is essentially a resolution dependent phenomenon. 
It implies that a change in the scale of imagery will change the whole 
perception of texture. With the above understanding in mind some definitions 
of texture can be given (Dikshit, 1992) 

1. Those relationships between gray levels in the neighboring resolution 
cells which contribute to the overall appearance or visual characteristics 
of an image and are retained even, for example, when a colored scene is 
transformed into varying levels of gray, are collectively known as the 
texture of the image. 


13 



2. Iexture could be defined as a structure composed of a large number of 
more or less ordered similar elements or patterns without one of these 
drawing special attention so that the global unitary impression is offered 
to the observer (Gool et al. , 1985) 

There are two approaches to texture. In first approach, one-dimensional Grey 
Level Difference 1 listogram (GLDH) is calculated. Then using it, texture 
properties like, e.g. angular second moment, contrast, mean and entropy etc. 
are calculated and reported as texture. The angular second moment (asm) is 

;V 

given as ]T p(i ) 2 , where p (i) is the normalized values at the 2 th gray level in 

i 1 

GLD1 1 and N is the quantization level. In second approach, a two dimensional 
gray-tone spatial dependency matrix (also called gray-level co-occurrence 
matrix, GLCM) is calculated. Ilaralick et al. (1979) proposed gray-tone spatial 
dependency matrix, which represents the distance and angular spatial 
relationships over an image sub-region of specified size. Each element of the 
gray-tone spatial dependency matrix is a measure of the probability of 
occurrence of two grayscale values separated by a given distance in a given 
direction. The number of adjacent pixels with gray levels i and j is counted and 
is placed in element (i, j) of the gray-tone spatial dependency matrix. Four 

definitions of adjacency are used: Horizontal (0°), vertical (90°), diagonal 

(bottom left to top right, 45°) and diagonal (top left to bottom right, 135°). 
Thus four gray-tone spatial dependency matrices can be calculated. The average 
of these four measures is normally output as the texture value for the pixel 
under consideration. Haralick proposed 32 texture features to be derived from 
each of the four matrices. Three of the more widely used include the angular 
second moment (ASM), contrast (CON) and correlation (COR), ihey differ 


14 



slightly in their definition. Shaban (1999) has repotted similar results for both 
CjLDI 1 and (iLOM approaches, however GLDH approach is computationally 
less expensive. 

In the present project, a window based approach (Bischof et a/., 1992) for 
implementation of texture has been adopted. In this approach, in place of 
calculating one single texture value from the GLCM, we give the whole 
neighborhood information as spectral values to the neural network. This 
enables the neural network to infer texture on its own. 


2.4 Artificial neuron 

An artificial neuron is the fundamental processing unit of any artificial neural 
network. It is the mathematical representation of the biological neuron. The 
biological neuron is briefly described in the next section. 

2.4.1 The biological neuron 

The biological neuron is shown in Figure 2.1 with its various connections with 
other neurons. A typical cell has three major regions: the cell body, which is 
also called soma, the axon , and the dendrites. Dendrites form a dendritic tree, 
which is a very fine bush of thin fibers around the neurons body. Dendrites 
receive information from neurons through axons— long fibers that serve as 
transmission lines. An axon is a long cylindrical connection that carries 
impulses from the neuron. The end part of an axon splits into a fine 
arborization. Kach branch of it terminates in a small endbulb almost touching 
the dendrites of neighboring neurons. The axon-dendrite contact organ is 


15 



called a synapse. Synapse is where the neuron introduces its signal to the 
neighboring neuron. 



Figure 2.1 The biological neuron (Zurada, 1999) 

2.4.2 The mathematical equivalent model 

A mathematical equivalent of the biological neuron is discussed in this section. 
A simple schematic diagram describing a mathematical model is shown in 
Figure 2.2. In mathematical model, any neuron-processing unit (equivalent to 
biological processing unit s on/a) has several input activations, which are 
equivalent to dendrites in biological neuron. The relative importance of those 
connections with respect to the neuron is manifested in the connecting weights 
between them. The weighted sum of all the inputs is calculated and depending 
on the sum an output response is calculated. This output, is fed to neuron in the 
next layer as input (similar to the functioning of axon in biological neuron). 


16 




Figure 2.2 A corresponding mathematical model of neuron / (Zurada, 1999) 


The neuron output signal is given by the following relationship: 


O 


/(X w a x j)i 


(2.3) 


/'-> 


where n> is the weight corresponding to connection of the /th neuron to the /th 
signal. Xj is the intensity of yth signal. Subsequently an activation function is 
imposed upon the sum. A typical activation function is given as 


fix) 


2 

1 -i- exp(-Ax) 


-1 


(2.4a) 


where X > 0. This is known as general bipolar activation function and is a 

continuous function. As X — >oo, the limit of this function becomes the binary 
bipolar activation function, which is equivalent to sgn(x) function (+1 for all w>0 
anti -1 for all x < 0). Shifting and scaling the bipolar activation functions can obtain 

1 

unipolar activation functions, given as 


/(x) = l/(l + exp(-A.Jt)); 

As X — >oo, the limit of this function becomes the binary unipolar activation function 
(+1 for all x>0 and 0 for all x < 0). The unipolar function scales the output 
between 0 and 1. Various activation functions are shown in Figure 2.3. 


17 




I'igure 2.3 (a) Bipolar and (b) unipolar activation functions. (Zurada, 1999) 
2.5 Artificial neural network 


Artificial neural network is an interconnected network of artificial neurons. 
With the understanding that a neuron has a number of inputs anti one output, 
we can construct connected networks made of these neurons. These neurons 
can be connected in a number of ways. The most popular is the feedforward 
network (Zurada, 1999). A multi layer feedforward network (I'igure 2.4) 
consists of several layers of connected neurons. Any layer can have any number 
of neurons. Initial excitations are input to all the neurons in the first layer and 
each neuron produces just one output. These outputs from all the neurons are 

fed as input excitations to all the neurons in next, layer atid so on, until a final 

output is achieved. 


18 


Layer j oi neurons 


f 

I 

f 



figure 2.4 A layered feed forward network with three layers (Xurada, 1999) 


2.6 Hidden layer size assessment 


In defining the network parameters, the input layer size is equal to the number 
of inputs (hands) and output layer size is equal to the number of classes. With 
these values set, the remaining parameters are the number of hidden layers and 
number of nodes per hidden layer. If we assume a three-layer network, the sole 
parameter to he considered is the number of nodes in the single hidden layer. If 
we view both neural network and maximum-likelihood classifier as decision 
making functions defined by a number of parameters (degrees of freedom), 
then it is logical to use the same number of parameters in each for comparison 
(Paoia and Schowengerdt, 1995). For a three-layer network, assuming one input 
node per image hand and one output node per class, the number of parameters 
as a function of network structure is 


19 


N nct = 3 + # weights 

= 3 + hidden layer nodes* (bands Tclasses) (2.5a) 

where the first term (3) is the number of parameters needed to specify the size 
of each of the three layers. For the maximum-likelihood classifier, the 
corresponding number of parameters is 

N ml — 2 + # means + # unique covariances 

= 2 + classes* bands+ 1 /2*classes*(bands 2 +bands) (2.5b) 

where 2 is the number of parameters required to specify features and classes. 
Setting N m . t equal to N M1 _ and solving for the number of hidden layer nodes, we 
get 

# of hidden layer nodes = 

(2+classes*bands+0.5*classes*(bands 2 +bands)-3)/ (bands + classes) 

( 2 . 6 ) 


2.7 Optimum sample size assessment 

For classification and latter for accuracy assessment the sample set is an 
important parameter. The question remains that how many minimum (or 
optimum) number of pixels are required per class so that the analysis 
performed is statistically valid. There are two schemes for this (Congalton and 

Green, 1999). 


20 



1. Binomial Distribution: The binomial distribution or the normal 
approximation to the binomial distribution is appropriate for computing the 
sample size for determining overall accuracy or the accuracy of an individual 
category. The equations are based on the proportion of correcdy classified 
samples and on some allowable error. 

2. Multinomial Distribution: In the case of an error matrix, it is not simply a 
question of correct or incorrect. Instead, it is a matter of which error or 
which categories are being confused. Given an error matrix with n land 
cover categories, for a given category there is one correct answer and n-1 
incorrect answers. Therefore, the use of binomial distribution for 
determining the sample size for an error matrix is not appropriate. Instead, 
the use ot multinomial distribution is suggested. 

Next, in order to determine the required sample size, the precision ^ for each 
parameter in the multinomial population must be specified. Generally for 
assessing the accuracy of remotely sensed data, an absolute precision is set for 
the entire classification and not for each category or each cell, therefore b i — b. 
If there is no a priory knowledge about the values of probability of occurrence 
of each of the classes in the image, we can assume a worst probability as equal 
to 1 /2. In this case the sample size N required to generate a valid error matrix 
is given as 

N = B/41/. (2.7) 

where B is the upper a/ k xIOOtb percentile of the y 2 distribution with 
on degree of freedom and b is the absolute precision. For 95% confidence, 
from the % z - table, we can obtain the value of B as 7.879 while for 99% 
confidence the value is 10.825. 


21 



2.8 Accuracy assessment 


We can assume that n samples arc distributed into Cecils where each sample is 
assigned to one of k categories in the remotely sensed classification. Let /y 
denote the number of samples classified into category i in the remotely sensed 
classification and in category yin the reference data set. Letting 




=5> s 


y'=i 


( 2 . 8 ) 


be the number of samples classified into category i in the remotely sensed 
classification and 


n 


+./ 


= 1 % 

fssl 


( 2 . 9 ) 


be the number of samples classified into category j in the reference data set. 
Overall accuracy between remotely sensed classification and the reference data 
can be computed as follows: 


* 

Overall accuracy = ]T n u 


( 2 . 10 ) 


Producer’s accuracy (error of omission) for class yean be computed as equal to 



( 2 . 11 ) 


22 



User's 'Accuracy (error of commission) for class i can be computed as equal to 



( 2 . 12 ) 


Krror of commission and error of omission are two measures of accuracy, so a 
measure of their agreement of accuracy is proposed as kappa. The kappa 
analysis is a discrete multivariate technique used in accuracy assessment, which 
removes chance agreements. The result of this analysis is the K hat values which 
truly represent the classification accuracy of any class as well as overall after 
removing chance match. 

The overall kappa, k for classification is given as 

k k 

v _ id iz! 

K " * ( 2 - 13 ) 

n 

I- 1 

And the approximate large sample variance cr 2 of the overall classification is 
given as 



n 


2(1 t| ) (2t i t 1 tj) (1 tj) (t 4 4t 2 ) 

|(l-b) 2 (1-A) 3 (l-* 2 ) 4 J 


( 2 . 14 ) 


where 


1 v- 

= -L n i< 


M 


1 * 

A =— Z'U'h/ 
n w 


23 



| k 

i i =— Z n a( fi i,+ f hi) 
n /=! 


1 - ^ ^ 
n /-i /^i 


I he kappa, k . and large sample variance for kappa, (/(k^ for any individual class 
can be calculated using (he following formulae 






(2.15) 


* (^ i > ~ - n „ )(«„«, , -////„•) + nn u (n-n h -«,, + «„)] (2.16) 


The test statistics, whether two classification schemes are significantly different 
from each other, is expressed by 


"0 I ' ' 

yj(7 2 (K,)+ar 2 (Kj) 

At 95% confidence level the critical value is 1.96 i.e. two error matrices are 
significantly different with 95% confidence if Z statistics value for them is 
greater than 1.96. If'/, statistics differs by more than 3.0, confidence level goes 
up to 99%. 


24 



2.9 Work by previous researchers 


Kancllopoulous et al. (1991) used a two level hierarchical network to classify 
land cover classes because, they argued, most land cover schemes are 
hierarchical in nature, they reported little improvement in accuracy. Lio and 
Xiao (1991) proposed block back -propagation or selective connection networks 
but they didn't compare it with the corresponding fully connected network. 

Heerman and Klmenie (1992) used a variety of training set sizes to conclude 
that accuracy didn't improve significantly for larger training sets. They devoted 
a section of their paper to the preparation of their training set. They refer to a 
'picking anti packing' technique. The 'picking' is accomplished by first 
employing an unsupervised clustering algorithm. Then small homogeneous 
regie ms are hand selected to represent the ground cover classes. Eliminating 
duplicate pixels within a class avoids redundant calculations and insures that 
contradictory input information is not presented to the network, but it curtails 
the networks ability to learn generalized heterogeneous input. 

Paola and Schowengerdt (1995) used Landsat Thematic Mapper images of 
Tucson, AZ and Oakland, California to do a detailed comparison of the back- 
propagation neural network and maximum-likelihood classifiers for urban land 
use classification. They concluded that neural network is more robust an 
approach for heterogeneous classes, because it is nonparametric in nature as 
against conventional approaches. 


25 


/ 



I ; oody and Yates (1995) examined the effect of training size and composition 
on artificial neural network classification. They observed that in the 
classification of the remotely sensed data set the classification accuracy was 
increasing significantly as a result of increasing the number of training cases for 
abundant classes in the image. They also warned towards careful selection of 
training set for the application in hand. 

foody (IT)5) examined neural network for its potential for soft classification. 
He tried to associate neural network class output value for a particular class of a 
mixed class pixel to the percentage of area that particular class in that pixel 
represents actually on ground. I Ie found that the output values in themselves 
were not strongly correlated to the pixel compositions. 

In a review article, Paola and Schowengcrdt (1995) analyzed five aspects of 
neural network classification namely input encoding, output encoding, 
architecture, training algorithms and comparison to conventional classifiers. 
They suggested the presenting of entire window of pixels normally used to 
calculate texture, which allows the network to infer an arbitrary texture as 
necessary to separate the classes. I he pixel values should be normalized 
between 0 and 1 , which avoids the use of scale vector each time sigmoid 
function is evaluated. Using raw values may lead to stalling of network learning 
due to early saturation (Kanellopoulos and Wilkinson, 1997). They suggest that 
generally for multi-spectral imagery a three layer (single hidden layer) fully 
interconnected network is sufficient and a four-layer network is unnecessary. 

Foody and Arora (1997) studied the affect of four factors on neural network 
classification accuracy, namely network architecture, training set size. 


26 



discriminating variables and testing data characteristics. They found that the 
accuracy of neural network classification tends to. increase with training set size 
although the increase in accuracy is not significant after a certain training set 
size. 1 he number of discriminating variables had a positive correlation with 
accuracy i.e. the accuracy was maximum when all the spectral bands available 
were used for analysis. Also, classification accuracy increased with the increase 
in testing set size. They also observed that neural network architecture has no 
significant effect on classification accuracy. 

As far as comparison with conventional classifiers (like GML) is concerned, 
most of the authors (Blonda et al, 1993; Fierens et al, 1994; Paola and 
Schowengerdt, 1995; Yoshida and Omatu, 1994; Kanellopoulos et al, 1993; Li 
et al., 1993; Bischof et al, 1992; Heermann and Khazenie, 1992; Kiang, 1992; 
Liu and Xiao, 1991) have reported similar or superior classification accuracy. 

In a paper suggesting strategies and best practice for neural network image 
classification Kanellopoulos and Wilkinson (1997) suggested a variety of 
measures. They suggested normalization of spectral values, avoiding "butterfly 
cffect"(drastic change in output value for a small change in input value), use of 
conjugate descent algorithm and use of combined neural and non-neural 
methods for accuracy improvement. 

Summary of various data encoding and network structures used by several 
authors is described in 'Fable 2.1. 


27 



"Fable 2.1 Summary of various data encoding and network structures (Paola and 


Schowengerdt, 1995) 


Author 

Imagery 

Input data 
encoding 

Network 

structure 

Benediktsson et al 
(1990 a) 

MSS, elevation 
slope, aspect data 

Grey coding 

56-32-10 

56-32-4 

Benediktsson et al 
(1990 b) 

60 bands of 
Simulated HIRIS 

Binary coding 

12 bits per band 

240-15-3,480-15-3 

720-20-3 

Hepner et al 
(1990) 

4 TM bands 

3x3 window in 
each band 

36-10-4 

Key et al. (1990) 

Merged AVI IRR 
and SMMIl 

Individual pixel 
values 

7-10-12 



Individual pixel 
values 

6-18-54-20 

1 wBBmBM 


Binary data, 8 
bits per band 

24-24-5 

Bischof et al. 

(1992) 

7 TM bands 

Coarse coding, 
also with 5x5 
window 

91-5-4, 116-8-4 
140-8-4 

Li and Si (1992) 

10 band airborne 
spectrometer 

I ndividual pixel 
values 

10-7-3 

Wilkinson et al. 
(1992) 

2 dates of 3 SPOT 
bands 

Individual pixel 
values 


Civco (1993) 

6 'I'M bands 

Individual pixel 
values 

6-15-15 

Dreyer (1993) 

3 SPOT bands, 
texture calculations 

Individual pixel 
values 

3-13-12-9,6-8-8-9, 

42-7-7-9 


The work done by various investigators in the field of back-propagation 
artificial neural network is summarized in Table 2.2 


28 














Table 2.2 Summary of the work done by recent investigators in the field of 
back-propagation ANN. 


Author 

Work 

Kanellopoulous et al. 
(1991) 

Two level hierarchical network for classification 

Heerman & Khazenie 
(1992) 

Effect of training set size on accuracy 

Bischof (1992) 

Use of window method as a texture measure 

Foody (1995) 

Soft classification of the neural network output 

Paola & Schowengerdt 
(1995) 

Comparison of back-propagation neural network 
and maximum likelihood classifier for urban 
environment 

Foody & Yates (1995) 

Effect of training set size and composition on 
neural network accuracy 

Foody and Arora (1997) 

Effect of network architecture, training set size, 
discriminating variables and testing data 

characteristics on ANN classification accuracy. 


29 





3 Theory of hack-propagation ANN 


3.0 Introduction 

In this chapter, the basic theory underlying back-propagation artificial neural 
network has been described. The chapter begins with a description of concept 
of supervised and unsupervised learning in neural network. After that a brief 
description of various ANN learning rules is given. That is followed by 
concepts of a multi-layer network and generalized delta-learning rule. Lastly, an 
algorithm for error back-propagation training has been given. Zurada (1999) 
has been taken as reference for the theory of artificial neural network. 

3.1 Supervised and unsupervised learning 

By learning, we mean a process of forcing a network to yield a particular 
response to a specific input. The concept of feedback plays a central role in 
learning. There are two types of learning: learning with supervision and learning 
without supervision. In supervised learning we assume that at each instant of 
time when the input is applied, the desired response d of the system is 
provided by the teacher. The distance p[d,o] between the actual and desired 
response serves as an error measure and is used to correct network parameters 
externally. Since adjustable weights are assumed, the teacher may implement a 
reward and punishment scheme to adapt the network's weight matrix w. 
Typically, supervised learning rewards accurate classifications or associations 
and punishes those which yield inaccurate responses. The teacher estimates the 
negative error gradient direction and reduces the error accordingly. Most 


30 



supervised learning algorithms reduce the stochastic minimization of error in 
multi-dimensional weight space. 

In learning without supervision, the desired response is not known; thus 
explicit error information cannot be used to improve network behavior. In this 
mode of learning, the network must discover for itself any possibly existing 
patterns, regularities, separating properties etc. Since no information is available 
as to correctness or incorrectness of responses, learning must be accomplished 
based on observations of responses to inputs. Thus, the technique of 
unsupervised learning is often used to perform clustering as the unsupervised 
classification of the objects without providing the infonnation about the actual 
class. 



X 



00 


(b) 


Figure 3.1 Block diagrams for (a) supervised and (b) unsupervised 
learning. 


31 






3.2 Neural network learning rules 


Let the weight w„ or its components w t] connecting the yth input with the zth 
neuron. In general the yth input can be an output of another neuron or it can be 
an external input. The following general learning rule is adopted in neural 
network studies (Amari, 1990): The weight vector Vf-=[ n> it iv a ... wj increases in 
proportion to the product of input x and learning signal r. The learning signal r is 
generally a function of w ; , x and teacher's signal dj. Thus, the increment of the 
weight vector w, produced by the learning step at time t is 

Aw— c r[w i} x,d;] x (3.1) 

where c is a positive number called the learning constant. 

Some important learning rules are listed as below : 

(a) Hebbian learning rule: It is suitable for only for unsupervised learning. 
For the Hebbian learning rule the learning signal is equal simply to the 
neuron's output. Thus, we have 

r=/(w,'x) (3.1.1a) 

and the weight increment becomes 

Aw = cy[Wj*x)x (3.1.1b) 

(b) Perceptron learning rule: For the Perceptron learning rule the learning 
signal is equal to difference in the neuron's output and desired response. 
Thus, we have 

r=d r /Kx) (3.1.2a) 

and the weight increment becomes 

Aw,= c (d,- f (Wj*x))x (3. 1 .2h) 


32 



(c) Delta learning rule: The learning signal for this rule is defined as 

r=[d r /(w i , x)|/'(w i , x) . (3.1.3) 

The term f '(w/x) is the derivative of the activation fuction f (x). T'he 
delta-learning rule is only valid for continuous activation functions. Delta 
learning rule can be generalized for the whole network and is the basis of 
error back-propagation training. The generalized delta rule is explained 
in section 3.3.3. 

3.3 ANN as a Classifier 

For patterns that are linearly separable, single layer networks can be used to 
classify them. But the patterns that are more complex (that cannot be divided 
into separate classes by decision hyper planes when plotted in the feature 
space), we need to use more complex networks to classify. Multi-layer 
feedforward network can be used to learn mapping of any complexity. 

3.3.1 Multi layer network 

Generalized delta rule can be used for feedforward layered neural networks. 
For simplicity, three continuous perceptron layers (Figure 2.4) are considered. 
The network has I input neurons, / hidden neurons and K output neurons. The 
input pattern is presented to the first layer to produce responses that act as 
inputs to the second layer. Initially, the weights are assigned randomly. The 
response of a neuron, which is the fundamental unit of all the layers, can be 
calculated using the basic definition of the neuron functioning. Let the inputs 
b e x = [xf x 2 ... xj and the current weights connecting any neuron in the 


33 



subsequent layer to the inputs be w = [iv t iv 2 ... wj ‘ . The vector product Wx is 
the net stimulus for any particular neuron. Being excited by this stimulus, the 
neuron produces an output which is governed by an activation function like the 
sigmoid function/ (x)—1/ (1 + exp (-x)). 

These outputs are produced by all the neurons in the layer being excited and 
they act as activation for the next layer of neurons (in this case final layer). The 
process goes on until a final output from the final layer is achieved. Now, 
supervised training comes into picture. For each input pattern that we present 
at the input node, there is a predetermined desired response that we want 
corresponding to that input. This can be a value of 1 for one output node and 0 
for all others. The difference in the response that we desired and the one that is 
actually produced is called the error. It is calculated as E=1/2*Z (d k -oj 2 , 
where d and o are respectively desired and actual response vectors. 

This error is used to modify the existing weights in such a manner that the 
error produced with the modified weights and the current vector i.e. the E 
calculated after modifying is less than that before modifying. This is achieved 
by negative gradient descent formula. Thus, the network is trained until error 
becomes less than a threshold or stabilities to a specific value (i.e. rate of 
decrease with iterations becomes slow). 

3.3.2 Momentum method 

The objective of the momentum method is to accelerate the convergence of 
error back-propagation training algorithm. Hie method employs, in addition to 
weight adjustment by gradient descent, an extra momentum term that pulls the 


34 



network towards faster convergence. A momentum constant (a) is multiplied 
to the weight correction of the previous step and added to the current step. 

Aw(t) = -77V£’(/)+aAw(/-l) (3 2) 

3.3.3 Generalized delta learning rule 

Consider the three layer neural network architecture (Figure 2.4) .The first layer 
is input layer. Second (hidden) layer is input with activation vector z and it has 
corresponding connecting weights v-. This layer produces output vector y, 
which act as the activation for the subsequent layer, connecting weights being 
This layer produces output vector o. 

The negative gradient descent formula for the hidden layer reads 

* dE 

Av y ,=-n— (3.3) 

p 

Fory = l,2,..-7 and i=l,2 ,. . .1 

If sum of weight activation product for yth neuron be net-, then 


& v ji = rj5 y jZi 


(3.4) 


35 



Syj being the error signal term of the hidden layer having output y. The error 
signal term is equal to 




— -¥± 
tyj d(netj) 


(3.5) 


where 




(3.6) 


k~l 


after calculations' vve get 

dF * q (3-7) 

= -o*)— -{/!>«** 001} 

Wy jM %• 

we can simplify the above expression using the expressions for 5 ok and net k 


(IE 

ty, 


~z« 


ok W kj 


Putting the values back in equation (3.5) and rearranging 


A. 

Syj = fj {net j & 0 k w kj > 


k=\ 


(3.8) 


The weight adjustment in the hidden layer now becomes 


K 

Rvji =r)/j(net j )z i ^5 0k w k j, 

k=\ 


(3.9) 


This expression expresses the so-called generalized delta-learning rule. 


36 



3.3.4 Error back propagation training 


The following flow chart depicts the error back propagation training algorithm 
(adapted from Zurada, 1999) 



37 










Figure 3.2 Error Back Propagation Training Algorithm (EBPTA) flow chart 


38 






4 


Methodology 


4.0 Introduction 


lhis chapter presents the methodology of the work undertaken. The first 
section describes sample set selection for various classifications done later. The 
next section describes various classification experiments performed in the 
research. The last section describes in detail the use of JapaNNS software for 
classifying remotely sensed images and analysis of results. 

4.1 Sample set selection 


False color composites of the given areas, generated using NIR, red and green 
bands were used for the selection of training and testing sample sets. The list of 
classes for the classification of images for the two study sites is given in Table 
4.1 for future reference. 


Table 4.1 Classes in the study sites 


Class 

Class 

Class names 

Class 

Class 

Class names 

no. 

abbr. 

(for \j4cknow) 

no. 

abbr. 

(for Bhopal) 

i 

a 8 r 

Agriculture 

i 

w 1 

Water 1 

2 

hr 

Barren land 

2 

w2 

Water2 

3 

fo 

Forest 

3 

wl 

Wedand 

4 

hr 

High residential 

4 

dv 

Dense vegetation 

5 

he 

High commercial 

5 

sv 

Sparse vegetation 

6 

mr 

Medium residential 

6 

du 

Dense urban 

7 

prk 

Parks 

7 

mu 

Medium urban 

8 

| $c 

Scrub 

8 

sc 

Scrub 

9 


Agnculture2* 

9 

br 

Barren 

10 

rv 

River 





* i 'allow land 


39 




I hrcc different sets of training samples and one set of testing samples were 
selected for each site. The training samples contained of 40, 80 and 120 pixels 
per class (10 to 30 //, n being number of bands) as suggested by Mather (1987) 
and Piper (1992). They are labeled as train40 , trainSO and train12Q. Testing 
samples had 79 and 88 pixels per class for the two sites as suggested by 
equation (2.7). The statistical properties of various sample sets are listed in 
Appendix A. 

4.2 Classifications 


Classifications were carried out using Gaussian maximum likelihood classifier 
as well as a variant of back-propagation artificial neural network (ANN). Three 
types of classifications were carried out: 

1. Classification using Gaussian maximum likelihood classifier using only 
spectral values as well as using combined spectral and texture {asm) 
feature; 

2. Classification using back-propagation ANN with only spectral values: 
Further, the classification with only spectral values was done in two 
ways: 

2a. Classification with back-propagation ANN, while using different 
number of hidden nodes. 

2b. Classification with back-propagation ANN, while using training sets 
of different sizes. 

3. Classification using back-propagation ANN using combined spectral and 
texture features. 


40 



The Gaussian maximum likelihood classification was carried out using IL.WIS 
software. The generation of asm texture feature was done using program 
GLDH.c. The classification using neural network was carried out using 
JavaNNS. The training method used for the training of the networks was error 
back-propagation training using momentum term because it ensures faster 
convergence (Zurada, 1999). Training was stopped when error decrease 
becomes slow, i.e. when error versus iteration curve becomes flat. The learning 
rate and momentum term for the training of networks were set to 0.2 and 0.5 
respectively. The number of iterations for training of various networks was 
chosen as 1000. Shuffling of patterns alternative was activated, i.e. the training 
patterns were presented to the network in a random fashion. 

4.2.1 Classification using Gaussian maximum likelihood classifier 

While classifying using Gaussian maximum likelihood classification, no lower 
threshold on the probabilities was specified so there are no unclassified pixels 
in the resulting classification. train80 set was used as the training sample. In 
GML classification with only spectral values, the four spectral bands (as 
mentioned in Table 1.1) were used. For GML classification using combined 
spectral and texture features, a texture band was also used in addition to four 
spectral bands. This texture band was generated using GLDH from band 1, 
using angular second moment criterion. Asm feature is taken as representative 
conventional texture approach as the study of Shaban (1999) concludes that 
results obtained using various different texture features (asm, contrast, entropy 
etc.) are statistically similar. The window sizes chosen for the generation of 
texture bands were 3, 5, 7 and 9. The results obtained by this are compared to 


41 



corresponding (same window size) window based texture approach discussed 
in section 4.2.3. 

4.2.2 Classification using back-propagation ANN with only spectral 
values 

In classification of images using neural network the normalized spectral values 
are fed to the input layer of the network, while desired output value depends 
on the class to which the training vector belongs. 


4.2.2.1 Classification with back-propagation ANN, while using different number of 
hidden nodes. 

The training set used for these methods was train&O. For site 1 (Lucknow), the 
networks had 4-»-10 configurations, where n is the number of nodes in the 
hidden layer. The values of n chosen were 6, 8, 10, 12, 14, 16, 18, 20, 22 and 24 
for both the cases. For site 2 (Bhopal), networks had 4-»-9 configurations. First 
(input) layer always had four nodes, which is equal to the number of bands 
while the third (output) layer has ten (or nine, depending upon case study) 
nodes, equal to the number of classes 

4.2.2.2 Classification with back-propagation ANN, while using training sets of 
different sizes 

Three different training sample sets train40 , train80 and train120 were used for 
the training of networks. A fixed configuration of network for each of the sites 
was selected and different sample sets were used to train it, one set at a time. 
The configuration used for case study 1 (Lucknow) was 4 -10-10, while for case 
study 2 ( Bhopal) configuration was 4 -10-9. 


42 



4.2.3 Classification with back-propagation ANN while using combined 
spectral and texture features 

1 he method of implementation of combined spectral and texture features using 
window method is explained in Figure 4.1 for a 3 x 3 window case: 


ha 


Band2 






b, 






Band3 






b 4 






Uand4 


a, 

a* 

a 2 

:i s 

a 3 

a 7 

a 8 

% 


Bandl 


b 2 

b 3 

h 4 

a i 


a 2 


a 9 


Fed to neural 
network input 
layer 


$■ 


Figure 4.1 Implementation of texture using window method (3x3 
window example) 







Similar procedure is used for higher window sizes. The number of input nodes, 
when using three pure spectral bands (2, 3 and 4) and one spectral as well as 
texture band (band 1) is given as: 

Number of input nodes = 3 + w 2 ; 

where w is the size of the window. For window sizes 3, 5,7 and 9, the number 
of input nodes are obtained as 12, 28, 52 and 84 respectively. 

The number of hidden nodes was kept constant at 10, while the number of 
output nodes were 10 and 9, for case study 1 (Lucknow) and case study 2 
(Bhopal) respectively, 'flic training sample used for training of the networks was 
train 80 similar to GML classification. 

4.3 Classification of images using JavaNNS 

The neural network software used to do image analysis is the JavaNNS 
(Originally SNNS -Stuttgart Neural Network Simulator: JavaNNS is a Java 
interface written to make the software user friendly). This software can be 
downloaded from site http://www-ra.informatik.uni-tuebingen.de/SNNS/ . 
JavaNNS is a neural network simulator written for general neural network 
simulation. In order to make it useful for classifying remotely sensed imagery, 
an interface was developed. 'This has been done through various codes written 
in C programming language. JavaNNS software offers a lot of facilities to 
define different types of networks and to do operations on them. In this 


44 



research, JavaNNS is used to implement fully connected feed forward networks 
and back-propagation with momentum training. 

The software takes in a training file (. pat) to train a network. This file consists 
of input activations and corresponding desired output activations. To generate 
such a file from image files a C program has been written which is called 
trainpatc. This program takes band images and training image of size 512x512 
pixels in Idrisi .mg (or any other byte (8 bit), binary} format to generate a 
training file for the network. In the second version (trainpat2.t) texture window 
files (Jxt) can also be incorporated. The desired output value for a class i pixel 
is a 1 at the /th output node and 0 for all other node locations. For example for 
a pixel belonging to class 3, when the total number of classes is 8, the desired 
response will be (0 0 1 0 0 0 0 0). Another important file to classify an image is 
the pattern file for the whole image. It is different from the above-mentioned 
file in the sense that it does not use any training file i.e. this file docs not have 
any desired outputs. The program that generates this file is citypat.c (texture 
version citypatl.t). This program takes band images 512 x 512 pixels in size to 
generate a full image pattern file. This file can be later used to classify the full 
image, once the network is trained. 

The result file generated from the JavaNNS is in .res format. It consists of 
output responses of all the pixels one by one. Output response is the 
membership values of that pixel in all the classes. A pixel is assigned to a class 
for which the class membership value is the highest. The program result.c takes 
in JavaNNS output .res file and/or classified .mg file and training/or testing dmg 
file to generate confusion (error) matrices which can be further used for 


45 



statistical analysis using program stats. c. A step-by-step procedure for 

classification using JavaNNS is given below. 

Steps in classification using JavaNNS using spectral features only: 

Step 1 Convert images bands into Idrisi .img format. 

Step 2 Sample set (training and testing) selection using ILWIS and its 

conversion into Idrisi .img format 

Step 3 Use program trainpat.c to generate JavaNNS training file. 

Step 4 Use program citypat.c to generate JavaNNS full pattern set file for the 
whole image. 

Step 5 Use JavaNNS to define a network and train it using file generated in 
step3. 

Step 6 Load full pattern file (step 4) and save the result in .res file. 

Step 7 Use program re suite to generate error matrix corresponding to the .res 
file generated in step 6 and also for the generation of JavaNNS 
classified image in Idrisi .img format 

Step 8 Use program stats. c to calculate statistics for the error matrix obtained 
in step 7(a class name (.txl) file containing the names of classes 
should also be supplied). 

Steps in classification using JavaNNS using combined spectral and 

texture features: 

Step 1(a) Convert images bands into Idrisi .img format. 

Step 1(b) Use mn.c to generate window band file (This file has the spectral 
values for any nxn window) 

Step 2 Sample set (training and testing) selection using ILWIS and its 
conversion into Idrisi .img format. 


46 



Step 3 Use program trainpatl.c to generate JavaNNS training file (Files from 
step 1 and 2 are used as inputs) 

Step 4 Use program citypat2.c to generate JavaNNS full pattern set file for 
the whole image. 

Step 5 l Is c JavaNNS to define network and train it for file generated in step 

3. 

Step 6 Load full pattern file (step 4) and save the result in .res file. 

Step 7 Use program result. c to generate error matrix corresponding to the 
.res file generated in step 6 and also for the generation of JavaNNS 
classified image in Idrisi .img format. 

Step 8 Use program stats. c to calculate statistics for the error matrix 
obtained in step 7 (a class name (Jxl) file containing the names of 
classes should also be supplied). 


47 




Figure 4.1 Mow chart of classifying images using JavaNNS 


48 
















5 


Results and Pis cus sion 


5.0 Introduction 

Various experiments conducted were explained in previous chapter (section 
4.2). Ihe results of those experiments are presented in this chapter. The first 
section presents results of GMT, classification. The next section describes the 
effect of choice of number of nodes on neural network classification accuracy. 
Generalization property of neural network is discussed in the following section. 
This is followed by the effect of training set size on classification accuracy. 
Finally, the effect of combined spectral and texture method on classification 
accuracy is discussed. 


5.1 GML Classification 


Results obtained from the GML classification of both the images are presented 
in Table 5.1 . 


Table 5.1 Kappa values obtained for GML classification 


For Lucknow 

For Bhopal 

Group 

no. 

Class 

no. 

Class 

abbr. 

Train 

Test 

Group 

no. 

Class 

no. 

Class 

abbr. 

Train 

Test 

1 

mm 




1 

1 



1.000 


B9I 



2 

w2 


Tfl 

10 

rv 

1.000 

1.000 

3 

\vl 

1.000 

MMrfsl 

2 

2 

hr 

1.000 

0.701 

6 

du 


0.984 

3 

fo 



8 

sc 

EBI 

■Qj 

4 




9 

br 

KIM 

1.000 

5 

he 



2 

4 

dv 



6 




5 

sv 

0.896 

E3 

7 

pk 

0.858 

0.469 

7 

mu 

0.986 

0.461 

8 

sc 

0.944 








Overall 




Overall 

0.966 

0.845 


49 





































GML classifier has performed well (kappa >0.90), both in case of training and 
testing, for classes agriculture , agnculture2 and nver for Lj/cknow site and waterl , 
water!, wetland, dense urban , scrub and barren for Bhopal site (Figure 5.1). These 
classes can he grouped as group 1. Most of these classes have lesser spectral 
variation in terms of standard deviation (appendix A). For other classes the 
accuracies are less (kappa <0.90) for training or testing. Lesser accuracies of 
these classes can he attributed to their confusion with other classes. For 
example, in Ijucknow case class forest is confusing with classes park and scrub. 
Classes high residential and high commercial are confusing with each other. Other 
significant confusions within classes are park with scrub , and agriculture 2 with 
medium residential and barren. In Bhopal case, class sparse vegetation is getting 
confused with dense vegetation , class medium urban is getting confused with dense 
urban and scrub. Classes other than groupl are grouped as group 2. Most of the 
group 2 classes show higher values of spectral variation in terms of standard 
deviation. The GML accuracies of group2 classes are shown in Figure 5.2. 



Figure 5.1 GML accuracies of group 1 classes 


50 




GML accuracy (I .ucknow) gronp2 classes 



GML accuracy (lihop‘A)^mip2c/asses 



dv 


sv 

classes 


mu 


Figure 5.2 GML accuracies of group! classes 

Overall accuracy of Bhopal image is higher as compared to laicknow image. This 
may be due to the difference in spatial resolution of the two images. Accuracies 
tend to improve for lower resolution images because of the averaging effect of 

classes. 

5.2 Effect of number of hidden nodes on classification accuracy 


The results obtained for varying number of hidden nodes are listed in tables 5.3 
and 5.4. The following are the main observations from these results. 


(1) For laicknow, group 1 classes show statistically similar 1 training accuracies to 
GML classification for all hidden nodes. For testing, however, there is 


statistically significant drop in accuracy 


for all hidden nodes except nodes 
‘FP’dvTJT 

A~ 


1 41877 


f Z value less than 1.96 (for 95% confidence) 


51 




12 for class agriculture. 1'or classes agriculture 2 and river, there is a statistically 
non-significant drop for all hidden nodes. 

(2) Amongst group!, classes barren, forest, high residential wad high commercial show 
statistically similar accuracy for both training and testing for all hidden 
nodes, h or classes medium residential and park there is significant 
improvement in testing accuracy for all hidden nodes. For class scrub, 
hidden nodes 8 and 12 gave statistically poor training results to others, 
while testing results for all of the hidden nodes were similar to GML 
classifier. 

(3) Amongst group 1 classes for Bhopal case, there was a significant drop in 
testing accuracy of class wetland except for hidden nodes 16 and 18. The 
accuracy for class dense urban dropped significantly for training for hidden 
nodes 8, 16 and 20, while for testing drop was there for all hidden nodes. 
For other group 1 classes results are statistically similar for all hidden nodes. 
Thus, it appears that for groupl classes resting accuracies of ANN 
classification can be poor as compared to GML classification accuracies. 

(4) Amongst group2 classes for Bhopal site, class sparse vegetation gave poor testing 
accuracy for hidden nodes 12 and 18 while all other hidden nodes gave 
similar accuracy to GML classification. For other classes in group2 {dense 
vegetation and medium urban) accuracies are statistically similar for all hidden 
nodes, however, testing accuracies are improving (non significantly) for 
class medium urban. 


52 



Table 5.2 Kappa and Z-statistics values for 'Lucknow area when using different number of hidden nodes 


i ri o ir 

Of CO c 

l © © t - 


d 

d 

© 

d 

d 

d 

o 

00 

p; 

vO 

vo 

o 

8 

2 

in 

m 

m 

fO 

r - 

l 

r 

© 

© 

f- 

m 

o 

d 

d 

o 

o 

© 

d 


i 

-t 

© 

d 

© 

f 

© 

d 

i 

r 

cd 

0.560 i 

CO 

1 

r- 

© 

CO 

t 

vO 

d 

0.941 


© 

, 

o 

r i 

o 

o 

ci 

i 


f - 

ci 


r- 

© 

© 

© 

© 

© 

© 

© 

d 

d 

o 

d 

d 

o 

© 


boho 
!r I r| © 


© © © 


© © © © 
k o d © 


O -t oo © 

: 58 o 

© r-i r-I Q 


I © r— O 

t elm i 
O m ri ov 


■frcir oo o © © cv ci 

Nl © © t-< © CO © CO 

k r : P ^ O o rt* © 

Clr-T-iddr-io© 


© co © o © 

rt in -+ vo m 

o in v o n 


m$ 


©©©*-. r - oo oo oIthI 
2 J o ^ i- o © © £3 
vt Cl vf Cl T— © rj- o|oo| 

^dridot'td 


vD 

ri 

© 

vp 

sO 

i 

CO 

vo 

O 

© 


© 

o 

Cl 

© 

CO 

p 


o 

© 

so 

"t 

r-i 

r-i 

o 

o 

r-i 

o 


VO r~* r~* O OO 

O id id o h 


Mnpvo©0©0o0-t©v© 
ooco-r— vo © © © © oo th 
o co n c cn in t-h t- ci p 
ciddoddddddci 


IP 

*a 

m C 3 O 


c*"Ococ'OhncotnN 

'-(^ooo'tcofncoiriNc 

co p p © © o h pi oo m c 

ddddddddddo 


coo©c\©vovov©r-«© 

n 00 r. ri r. n r- r, Mfl 

p p r # oo f r ; p r ■ -t © 

dddddddddo 


WC\C\Or^00r<T-'tO'O00 

^r^r^dt-q^qininvtoo 

oik^ddr-Jkid'ddcvi 


SinC 00 K)ir)CinT-iM(Mri 

N 'tCvQVvO’t'tCIr-ioO 
VO h O W t O OO C T" M N 

ddddddddddo 


o na 

oo 

r • 

•SO 

so 

Cl 

© 


rr 

© 

Cl 

Cl 

r-i 

O' 

-r 

00 

o 

o 

fC 

© 

-t 

00 


© 


00 

© 

© 

{•■ 

r • 

1 ■ 

O 

© 

© 

m 

fs 


© 

d 

o 

d 

d 

o 

d 

o 

o 

© 

O' 

00 o 

o 

co 

vO 

fs- 

Cl 

r- 

00 

00 

77 

Cl 

OO 


CO 

rf 

00 


© 

rt* 

Tf* 

© 

© 

© 

rt 


r *• 

© 

© 

c- 

© 

C* 

© 

© 

c- 

© 

p 


d 

o 

d 

© 

d 

d 

d 

o 

o 

© 

d 

vo JO 

© 

r*' 

TT 

VO 

m 

cn 

Cl 

Cl 

c* 

00 

00 


n* 


m 

00 

o 

© 

o 

tn 

r 0 

© 

© 


r- 

© 

© 

VO 

© 

r- 

© 

00 

00 

©j 

© 


d 

d 

d 

d 

© 

© 

o 

o 

o 

o 

© 


© o — • ci ^ ri vo cv cv it) 

O Q O o © VO CV v0 00 o 

O o r i r- o o v d- c 

dddddddddo 


2 § a '5 J |c> 


o 

d 

o 

d 

;© 

's 

C /3 

© 

© 

© 

© 

8 

© 

o 

a 

N 


I W I W I W I I I ^ I I 

ri d d d d d d O 


S - ri o o t-- oo '■r cv vo o 

n n o q v i* ri m oo o 

© © O O CO CO CO Cv 00 © 


P x ??8 


a " C 7 V <n 
Ov 00 



© o © o © © © 


Cl © V- OD fs © t-4 

» ri m io rv oo h 

00 00 00 CV 00 00 p 

o © © o d © o 


it r- rt CO tO ifl O 

O ~T ci VO vo C H 

OO 00 00 Cv 00 OO p 

d d d d d d o 


r~* © t-» *t © VO -nf 

© ci -t m *-< oo <£* 

00 00 00 CV ov oo p 

o d o d d d o 


00 © oo CV © © VO 

© Cl *t © o * 3 * O 

O0 00 00 Ov Ov 00 p 

^ d d o d d o 


> cv © © i- ov 

• rO v 0 vO Cl H 

> 00 ov 00 Ov OV 

> o d d © © 


v © oo vo © © 

* © so o t r! 

i oo © © oo p 

; © d d © © 



oo © © 

Tt* r— r-< 
© © © 
cd o ci 


hoc 

r-> © © 

p V-H Q 0 

t O ri 


Cl vo 00 
CO O © 
p p p 
Ifi O ri 


Cv O r- V 
r- VO \£> © 
Cl Cl O O 


Cv © © 
uO VO 00 
C OON 


Cl Cl C O rt 

r- © Tf OO © 

O VO I/I 00 1C 

r-i © d d © 


© © © 
Cl Cl 00 
ro lA N 
© r-i d 


© 

© 

Cl 

© 

© 

ztz 1 

149 1 


© 

§ 

9 

r-» 

183 | 

© 

rt 

156 | 

Cl 

rf 

T* 

140 | 

© 

© 


© vo © *- Qv 

O oo «n (O Oj 

00 OO OO © 00 

© © d © © 


i >1 


ref © so 

r- © © 
T-» © "t* 
v-* O r-r 


T-* O cv -t VO © vo 

© © © © © OO 00 

t ir tr o n tn N 

© © d r-i © © © 


00 Ov r-r t— © vo tH 

© Cl rr © oO r- t> 

fO Cl |V O OO tr fO 

© d © r-‘ d ci © 


r-< oo © ci © in -«t 

© cv 00 O O rt oo 

-it © vo © © m © 

r-i d © d d © tH 


© »n oo o oo p in 

s; 3 r S $ I sj 

d o ci o o o o 


•<«] 



*number of nodes in the hidden layer 

e * Z value = 0/0 (both classifications give perfect accuracy) 

















Table 5.3 Kappa and Z-statistics values for Bhopal area when using different number of hidden nodes 

Training acc uracies Testing accuracies 

Class | GMLl 6*1 8~1 10l 12| ITi iff] iTi 20~i 22] 24 ' GML I 6 j > : lol 12 , 14 I 1 





























(5) Overall training results for Lucknow site are statistically similar to GML 
classification fot all hidden nodes, while for testing, results for hidden nodes 
10, 12, 14, 16, 22, and 24 were statistically better than GML classifier. For 
other nodes testing results were statistically similar to GML classifier. For 
Bhopal ; all the nodes are giving statistically similar results to GML classifier 
both for training and testing, except for nodes 6 and 8, which gave 
statistically poor results for testing. 

The above observations are put in a concise form in Table 5.4. Results not 
mentioned in this table are statistically similar to GML classification. Some 
important observations from the section are presented in Figure 5.3 and Figure 
5.4. 

Table 5.4 Summary of results for ANN classification when using different 
number of hidden nodes. 


jESJSHI 

Statistically 
better ^(training) 

Statistically poor 
(training) 

Statistically 
better (testing) 

Statistically poor 

(testing) 

ljucknow 

none 

sc (nodes 8, 12) 


agr y all except 12 nodes 


rv (nodes 6) 

Overall (all except 
nodes 6, 8, 1 8, 20) 

Bhopal 

mm 

du (nodes 8, 1 6 and 
20) 

none 

wl (all nodes except 1 6, 

18) 

sv (nodes 12) 

Overall (nodes 6, 8) 


*as compared to Gaussian maximum likelihood classifier 


55 














ANN accuracies (Lucknow) 





Thus, for pro up 1 classes neural network methods may give lesser accuracy as 
compared to CiML classification, while for group2 ANN methods are favorable. 
Among neural networks with different number of hidden nodes, performance 
of hidden nodes 6 and 8 is poor, however, other hidden nodes are giving 
similar performance; so the choice of 10 hidden nodes given by Equation (2.6) 
as optimal is correct. The results are in agreement to the observations by Foody 
and Arora (1997), that network architecture does not have a significant effect 
on accuracy 

5.3 Generalization property of the neural network 

Various training and testing accuracy values for two classification schemes 
(GMI, and ANN) are listed in Table 5.5. Network with hidden nodes 10 is 
taken for comparison because in the previous analysis it has been shown to be 
optimal, ft can be observed from the table that neural networks have better 
generalization capability as compared to GML for classes medium residential and 
park for latcknow and medium urban for Bhopal. Thus, we see that for group 1 
classes neural network does not have a better generalization ability for most of 
the cases, while amongst group2 classes, better generalization is obtained for 
some classes while for some it is worse. 


57 



Table 5.5 Comparison of GMLC and ANN for generalization ability. 



For Lucknow 

area 




For Bhopal 

area 




GML 

ANNi 

10 nodes) 


GML 

ANN (10 nodes) 

Class 

abbr. 

Train 

K (a) 

% 

Test 

K(b) 

% 

a- 

b 

% 

Train 

K(a) 

% 

Test 

K (b) 

% 

a- 

b 

% 

Class 

abbr. 

Train 

K(a) 

% 

Test 

K(b) 

% 

a- 

b 

% 

Train 

K(a) 

% 

Test 

K(b) 

% 

a- 

b 

% 

agr 

97.2 

93.1 

4 

93.4 

80.8 

12 

wl 

loo'.o 

100.0 

0 

98.6 

98.7 

0 


93.2 

100.6 

7 

92.2 

94.7 

3 

w2 

100.0 

100.0 

0 

100.0 

100.0 

0 

rv 

100.0 

100.0 

0 

100.0 

98.6 

1 

wl 

100.0 

92.8 

7 

100.0 

74.1 

26 

hr 

' 100.0 

70.1 

30 

100.0 

70.6 

29 

du 

97.2 

98.4 

i 

95.9 

90.7 

5 

fo 

” 91 .9 

76.2 

16 

“ “89.7 

76.2 

14 

sc 

98.6 

100.0 

l 

97.2 

97.8 

0 

hr 

83.5 

79.3 

5 

84.3 

73.9 

10 

br 

100.0 

100.0 

0 

100.0 

100.0 

0 

he 

75.8 

66.2 

10 

83.9 

69.1 

14 

dv 

85.4 

76.2 

9 

93.8 

68.8 

25 

mr 

93.0 

69.6 

'23 

96.9 

94.1 

3 

sv 

89.6 

100.6 

10 

85.8 

95.0 

9 

pk 

hsTh 

~ 46*9" 

39 

86.7 

68.9 

18 

mu 

98.6 

46.1 

53 

95.7 

50.8 

45 

JL 

SC 

~ 94A 

48.9 

~45~ 

92.1 

51.2 

41 








overall 

91.5 

70.5 

21 

91.9 

75.2 

17 

overall 

96.6 

84.5 

12 

96.3 

82.4 

14 


5.4 Effect of training set size on accuracy 

Testing performances of various training schemes are listed in table 5.6. The 
following conclusions can be derived from the table. 


(1) Sample sets traindO and train 80 gave similar overall results for both of the 
cases. For trainl 20 results are poor as compared to earlier two, however 
results are significant for Bhopal only(Figure 5.5). 

(2) From a class wise analysis of Lucknow case, medium residential and park 
classes are giving statistically poor results for trainl 20, for river class (no. 
10), sample set traindO is giving statisdcally poor results to trainl 20. For 
other classes difference is statistically non significant. For Bhopal case, 
classes wetland and dense urban gave better results for trainl 20. For medium 
urban class traindO gave better results while for barren class trainl 20 gave 


58 



pool msults statistically to other sets. For other classes results are 
statistically similar. 


Table 5.6 Testing set kappa values when 


using different sized 


training sets. 


Class 

40* 

80 

120 



z bc 

Class 

40 

80 

120 

Z ab 

z ac 

z bc 

abbr. 


O’) ... 

(c) ... 




abbr. 

(a) 

(b) 

(c) 


a JS r _ 

0.872 

0.803 

0.775 

t .22 1 

1 .657 

0.451 

w 1 

0.987 

0.987 

0.987 

0.000 

0.000 

0.000 

ag2 

(1.847 

0.907 

0.948 

0.912 

1.660 

0.722 

\v2 

1.000 

1.000 

1.000 



__ 


rv 

0.932 

0.959 

1.000 

0.7.30 

2.323 

1.772 

wl 

0.874 

0.835 

0.963 

0.748 

2.154 

2.902 

br 

0.711 

0.71*4 

0.659 

0.0.38 

0.737 

0.768 

du 

0.855 

0.844 

0.979 

0.189 

2.848 

3.033 

fo 

0.861 

0.710 

0.847 

1.575 

0.150 

1.365 

sc 

1.000 

1.000 

1.000 






hr 

0.607 

0.7,32 

0.698 

1.826 

1.284 

0.461 

br 

7*0.986 

0.985 

0.872 

0.063 

2.981 

2.912 

he 

0.66.3 

7)734 

()/>84 

0.824 

0.257 

0.658 

dv 

0.700 

0.755 

0.616 

0.758 

0.864 

1.514 

mr 

0.961 

0.984 

0.83.3 

0.7.39 

1488 

3.250 

sv 

0.698 

0.952 

0.502 

4.495 

2.907 

8.599 

Pk- 

0.815 

0.806 

0.607 

0.144 

.3*210 

3.003 

mu 

0.770 

0.632 

0.516 

2.137 

4.077 

1.802 

SC 

0.596 

0.548 

7)7)26 

0.726 

0.432 

1.105 








overall 

0.761 

0.775 

0.74.3 

0.623 

0.792 

1.413 


0.872 

6.872 

0.790 

0.000 

4.119 

4.119 


*size of training sample set 


Accuracy variation with training set size 


— ♦ — train40 

— ■— train80 

train 120 


sv(bh) mu(bh) ov(bh) mr(lk) pk(lk) ov(lk) 
classes 


*hh=B/)of>af, Ik— I JtcAnoiv, ov= Overall. 



Figure 5.5 ANN accuracies for different training set sizes (selected classes) 




Thus, increasing the training set size is not improving accuracy. This is in 
agreement with I leerman and Khazenie's (1992) finding that accuracy did not 
improve significantly for larger training sets. 

5.5 Effect of using texture 

All texture methods (described in section 4.2.3) significantly improved accuracy 
for many classes. The kappa values and Z statistics obtained for various texture 
methods is listed in fable 5.7 and 5.8. The following are the salient 
observations from the results: 

(1) For Ijicknow site, overall training and testing results obtained using window 
sizes 5, 7 and 9 are statistically better than those obtained using 
corresponding (same window size) asm methods, while window size 3 gives 
statistically better results for testing set and similar results for training set. 
For Bhopal site windows 7 and 9 are giving better results for training as 
compared to corresponding asm methods, however, testing performance of 
window 9 is poor. Other results are statistically similar. 

(2) Amongst group! classes texture window methods are found to be statistically 
significantly improving accuracies over corresponding asm method for 
classes high commercial. \ medium residential and park for Lucknow case and class 
medium urban for Bhopal case (Figure 5.6 and Figure 5.7). The best window 
size for high commercial \ medium residential and park is 7, 3 and 5 respectively, 
for Lucknow case and 5 for medium urban in Bhopal case. In Bhopal case, 
window texture methods are significantly reducing testing accuracy of class 
sparse vegetation. This may be because sparse vegetation is present surrounding 


60 



TaMe 3.7 Comparison of conventional (asm) and window texture methods for Lucknow case study. 




HHB 


ts ^2 5 y ® y 1 'O o i- in 

d P 5 V 2 ^ ^ 4 c) o\ 

P t ; *w> O I O O' VO iO 4 tf" 

Jo — ^ddddddd© 


ci m ~+ to t c N 

«N rj rj oo o 4 H 

f co o oo co m oo 

o © , © o o d © 


vO y — • fO 
r O vO Gv 
M* vo 



bt 22 £* so ho kr <n 


N 


r- 

CN 

r-i 


00 

CM 

VO 

cv 

CO 

l> 

rt 

i> 

00 

p 

<N 

cv 

n- 

co 

d , 

cO 

co 

vd 

o 

1 rt 1 


.o pi. § 

G®o 


w* <o oo m r - so co 
o f cv r - oo oo r *4 
© co o o rt pt in 

— « d © © 0(0 © 


© ■’■© 0 "'' r j ~ o 12 * '■"t* rs -rs m 

'£? <V LO n « -t « 3 - S 8 

J - _ C ,jC' o cv o r ■ ’t t>- 

- - d - 3 0 0 0 O o 


i m 


Tt Q © * cv ci oo o rj i* oj 
06 © O OQ CN : r ■ *-« r-H t n £ 


CO Cj { ■ r< H f to Ov 
o t or -t n* vo 


Ti 2 ^ ^ ^ 'CD O O M T-l 

fOOtn^LOhh^OOOrfM 
^cicOHodHtt-ciodto 


co Ci o oo 

vo tr 

} 00 xt 

vo 

o r- CO O 

oo cv 

4 O xf 

co 

!<* °. ^ ■* q 

CO cc 

) oo to 


1 T~< * 

H CV 

i o <vi 

T— « 

N { 





On ^ Jjfj 

S S 2 

< 


|C O O O O © 

o o o o o © 

o o o c o o 


vo CM 

to 

r- co co 

vo r- 

to r- 

CN 

CV vo CN 

co 00 

vt r- 

cO 

h O co 

Vt CO 

<H T~< 

oi 

oj CN 04 

T-H »/) 


oj ri th 
co r vo 

C> CV Cv 


o <N o o o o sp vo 

O r o O' 00 O 4 00 

© © O' © © O © Cv 


o o ro t -^r o so vo 

© co t ^ *t ri co ^ 

O © CO 00 Gv Gv GV ON 

Urt 0 0 0 0 0 0 © 


t o r- co vo o P M 
00 o T-H 00 no I CM co I 
to Gv O CO m I co 1 <H I 


* S3 

! 1M? 

Sod 

x> 


SSiO?? 18 f| 

00 00 CV © © 00 © 

0 0 0 0 0 0© 


to 

CN 

r-.. 

CO rt 

Trt 

Cv 


vO 

00 

oo r- 

00 

o 

to 


to 

co co 

9 

r- 

o 

o ! 

r—4 

d CN 


d 


$m+ 


2 ^ a 


* Cl to 
O t " ^ 
rt © ; © 

Sod 


§ Q O' 00 t£> rt 'O 00 
OmM'OrOOOOOW 
©©C 0 r*-©Q 0 ©© 

^^ddddddd 


S- a fc .a -e .a ,h g -a « 


B'H fc JsUU .a | n u 


nu 






* spectral (b2, b3, b4) + asm3x3 (bl) 

**spectral (b2, b3, b4) + win 3x3 (bl) 

■}■ Zii refers to Z statistics between asm (i) and win (i). 




















































many other classes like scrub and medium urban and due 
neighborhood interpretation of window-based approach, these 
getting classified as sparse vegetation. 


to strong 
classes are 


(3) For groupl classes, similar accuracies as compared to corresponding asm 
methods are obtained for most of the cases, however, agriculture 2 class in 
Ijickimv image is giving statistically poor testing accuracies, while class 
agriculture is giving better testing accuracies for window based methods as 
compared to corresponding asm methods. 

Thus, window based texture approach is found useful for classes like high 
commercial . ’ medium residential ’ park and medium urban while for classes that are 
more uniform like river, water etc. conventional texture approaches are 
favorable. Thus, it appears that window based texture approach captures 
neighborhood information in a better manner as compared to conventional 
texture methods. Overall texture results using window texture methods are 
better as compared to corresponding asm methods in Lucknow case, however 
for Bhopal case the results are similar for most of the cases. This may be 
because of the fact that Bhopal is dominated by group 1 classes. Window sizes 
5 and 7 are found better as compared to other windows. 


6 $ 




Figure 5.6 Texture accuracies for Lucknow site using conventional and window 
approaches (selected classes) 



Figure 5.6 Texture accuracies for Bhopal site using conventional and window 
approaches (selected classes) 




6 


Conclusion 


6.1 Conclusions 

The following conclusions can be made from the research: 

(1) The number of nodes in the hidden layer does not have much effect on 
classification. However nodes 6 and 8 give poor testing results as compared 
to other nodes. Hence, a minimum of 10 nodes should be selected. This is 
in agreement with the number of optimum nodes derived from theoretical 
considerations. 

(2) ANN gives better classification results as compared to GML classifier for 
some classes that are non-uniform in nature while for uniform classes 
GML is found to be good enough and ANN may give similar or lesser 
accuracy as compared to GML classifier. 

Accuracy does not improve significantly with increase in training set size; in 
fact it can even go down. Thus, minimal training set (10/;) is good enough 
for classification. 

( 4 ) Window based texture approach is- able to classify more accurately 
spectrally non-uniform classes, while for uniform classes conventional 
texture approach is found better. It may be because of better interpretation 
of neighborhood information by window-based approach. 


65 


Q Overall, window sizes 5 and 7 are found to be better for classification of 
urban environments as compared to other windows. 

6.2 Recommendations for future work 

Output of JJNNS is fuzzy in nature, i.e. membership values of various classes 
for a particular pixel are given. While doing hard classification over these 
values, maximum of these membership values is selected; thus information 
about other membership values is lost. Neuro-fuzzy classification techniques 
can be used, so that the fuzzy information available as the output of the neural 
network is utilized. 


66 



References 


Amari, S. I., 1990, Mathematical foundations of neurocomputing, IEEE Proc, 
78(9), 1443-1463. 

Atkinson, P. M., and Tatnall, A. R. L., 1997, Neural networks in remote 
sensing, International journal of Remote Sensing, 18, 699-709. 

Benediktsson, J.A., Swain and P.H., Erosy, 1990, Neural network approaches 
versus statistical methods in classification of multi source remotely sensed data, 
IEEE Transaction on Geoscience and Remote Sensing, 28, 540-551. 

Bischof, H., Schneider, W. and Pin 2 , A.J., 1992, Multispectral classification of 
Landsat-images using neural networks, IEEE Transaction on Geoscience and Remote 
Sensing 30, 482-490. 

Blonda, P., La Forgia, V., Pascjuariello, G., and Satalino, G., 1994, Multispectral 
classification by a modular neural network architecture, Proceedings of 
IGARSS'94, Pasadena , CA , (Piscataway, NJd.E.E.E.), 1873-1876. 

Civco, D. L., 1991, Landsat TM image classification with an artificial neural 
network, in Proc. ASPRSACSM Annual meeting Baltimore, MD, 3, 67-77. 


Congalton R.G., and Green K., 1999, Assessing the Accuracy of Remotely Sensed 
Data: Principles and Practices, Lewis Publishers: FL. 

Dreyer, P., 1993, Classification of land cover using optimized neural nets on 
SPOT data, Photogramme trie Engineering and Remote Sensing, 59(5), 617-621. 

Dikshit, O., 1992, The Classification of texture in remotely sensed 
environmental imagery, Ph.D. Thesis, University of Cambridge, U.K. 

Fierens, F., Ivanelloupoulos, I., Wilkinson, G.G. and Megier, J., 1994, 
Comparison and visualization of feature space behavior of statistical and neural 
classifiers of satellite imagery, Proceedings of IGARSS'94, Pasadena, CA, 
(Piscataway, NJrl.E.E.E.), 1880-1 882. 

Foody, G. M., and Arora, M.K., 1997, An evaluation of some factors affecting 
the accuracy of classification by an artificial neural network. International Journal 
of Remote Sensing, 18, 799-810. 

Foody, G. M, et al, 1995, The effect of training set size and composition on 
artificial neural network classification. International Journal of Remote Sensing, 16 , 
1707-1723. 

Foody, G. M., 1996, Relating the land cover composition of mixed pixels to 
artificial neural network classification output, Photogrammetric Engineering and 
Remote Sensing 62(5), 491-499. 



Gool, L. V., Dewaele, P., and Qosterlinck, A., 1985, Texture analysis, anno 
1983. Computer Vision, Graphics, and Image Processing, 29, 336-357. 

Haralick, R.M., 1979, Statistical and Structural Approaches to Texture, 
Proceedings of the IEEE, Vol.67, No.5, 786-803. 

Hara, Yoshihisa, Robert G. Addns, Simen H. Yuch, Robert T. Shin and 
J .A. Kang, 1994, Application of neural networks to radar image classification, 
IEEE Transactions on Geoscience and Remote Sensing, 32(1), 100-109. 

Heermann, P. D., and Kha/enie, N., 1992, Classification of multi spectral 
remote sensing data using a back-propagation neural network, IEEE Transaction 
on Geoscience and Remote Sensing, 30, 81-88. 

Hepner, G. F., Logan T., Ritter N., and Bryant, N., 1990, Artificial neural 
network classification using a minimal training set: comparison to conventional 
supervised classification, Photogrammetric Engineering and Remote Sensing, 56, 469- 
473. 

IRS 1C Data Users Handbook, NRSA, Hyderabad. 

Jain, A.K., 1989, Fundamentals of Digital Image Processing, Prentice Hall, Englewood 
Cliffs, N.J., U.S.A. 

Jensen, J. R„, 1996, Introductory Digital Image Processing: A Remote Sensing Perspective, 
Prentice 1 lall, Englewood Cliffs, NJ, U.S.A. 



Kanellopoulos, I., and Wilkinson, G. G., 1997, Strategies and best practice for 
neural network image classification, International Journal of Remote Sensing, 18, 711- 
725. 

Kanellopoulos, I. et al, 1992, Land cover discrimination in SPOT HRV imagery 
using an artificial neural network -a 20 class experiment. International Journal of 
Remote Sensing, 13, 917-924. 

Kiang, R. K., 1992, Classification of remotely sensed data using OCR-inspired 
neural network techniques, Proceedings of IGARSS'92, Houston, TX, (Piscataway, 
NJd.E.E.E.), 1081-1083. 

Liu, Z. K. and Xiao, J.Y., 1991, Classification of remotely sensed image data 
using artificial neural networks, International Journal of Remote Sensing, 12,2433- 
2438. 

Mather, P.M., 1987, Computer Processing of Remotely Sensed Images, (Chichester: 
Wiley) 

Pao, Y.H., 1989, Adaptive Pattern Recognition and Neural Networks (Reading, MA: 
Addison- Wesley). 

Paola, J. D., and Schowengerdt, R. A., 1995, A detailed comparison of back- 
propagation neural network and maximum-likelihood classifier for urban land 
use classification, IEEE Transactions on Geoscience and Remote Sensing, 33, 981-995. 



Appendix A 


Tab! 

e A .1 Sample s 

et charac 

teristic 

S (Lh 

Class 

avbl 

avb2 

avb3 

avb4 

sdl 

sd2 

"”agr 

98.71 

67.04 

89.10 

156.21 

5.00 

6.24 

ag2 

107.49 

78.65 

69.61 

145.16 

2.99 

2.54 

rv 

88.28 

50.64 

26.91 

77.76 

3.00 

1.75 


n^l 

93.1 1 

81 .76 

179.07 

6.51 

5.91 



35.16 

87.03 

98.63 

1.71 

1.40 



68.53 

56.14 

109.57 


4.21 



5970 

60.91 

105.63 




mam 

79.16 

72.26 

132.04 

6.77 

5.90 

■a 

MEEM 

41.45 

” 739'r 

<C\49 

5.05 

5.16 



35.78 

r "93.13 

8-1.16 

156 

0.94 


Tab] 

e A .2 Sample set characteristics (L# 




avM 

;tvl)4 

sdl 

sd2 

ESI 

ESS! 


88 09 

162.08 

7.16 

8.48 

mm 


80.27 

72.44 

154,41 

6.14 

7.59 

nr 

86.09 

50.25 

27,21 

60.78 

5,01 

2.65 


119.71 

93.19 

85.41 

174.70 

5.69 

5.51 


69.70 

36.29 

90.30 

<>5 33 

2.53 

2.06 


97.08 

66.11 

55.34 

107.43 
. « 

4.40 

3.72 

he 



58. "52 


9.45 

7.12 

mr 

106.11 

73.76 

70.52 

125.72 ’ 

6.39 

5.48 

139 



_ 75.29 

97.97 

3.93 

3.51 

sc 

73.95 

38.03 

3,85 

96.25 

~3W _ 

2J8 


* Taken over hi, 02 and b.V (1>4 is resampled) 


Table A .3 Sample set cha racteristics (BA 


Class 

E3H 

avb2 

uvb3 

TCtT 

2(1 25 ’ 

;ivl)4 

" 12.77 

sdl 

~ 0.85' 

sd2 


wl 

m 

15.74 

0.61 


w2 

wsm 

19.90 

12.34 

0.61 

0.56 


wl 

Ea 

16.89 

18.90 

27.99 

0.55 



wm 


28.17 

’ 41.10 

43.15 

1.40 

1.04 




19.98 

—jysn 

38,05 

0.62 

0.32 

1 

m 

Esa 

24.56 

~ 34.00 " 

llhz'L 

: 1.48 

1.24 

_ 

m 


23.05 

30.05 

28.79 

1.40 

1.07 

1 

sv 


25.01 

’ 39.02’ 

41 91 

0.75 

0.44 


mu 


19.26 

r 20. 


0.06 

I 0.79 



Table A .4 Sample set characteristics (Bh 


Class 

Ksssm 

avb2 

avl>3 


sdl 

sd2 


wl 

■ 3 

16.48 

16.05 

mm 

0.76 

0.52 


w2 


19.78 i 

20.11 

12.19 

0.57 

0.53 


_wl 


17.06 

19.31 

27.82 

0.50 | 

0.76 | 


du 


22.74 

30.?>8 

“ 3(725” 




sc 


23.18 

35.01 
™ 4o7*i r 

38 01 


0.93 


br 

_sv 


2TH5 

40.25 

1.72 

mjBm 


34.85 

1(5.93 

,, !7*73 1 
_ 20.00 

1 0 07 
J22.34 

44 81 
*40.64 

0 60 
ril.37 

0.50 _ 
t).(K) 

I 

mu 

43.26 

23.72* 

32.09 

! 35.83 

1.24 

1.01 










































































































































































Appendix B 


het.c 

win.c 

trainpat.c 

citypat.c 

trainpat2.c 

citypat2.c 

result.c 

stats.c 

GLDH.c* 


List of various C programs 

For the calculation of various statistical measures for the sample sets. 

For generation of window file from the band image. 

For the generation of JavaNNS training file when using only spectral values. 

For the generation of full pattern set files when using only spectral values. 

For the generation of JavaNNS training file when using combined spectral and 
texture values. 

For the generation of full pattern set files when using using combined spectral 
and texture values. 

For generating classified images and error matrices from JavaNNS result files. 
For the calculation of various statistical properties from the error matrices. 

For the calculation of asm texture band from the spectral band. 


* Developed by Maruthi Ram Ponnapalli, M.Tech. (Geoinformatics), 2001, IIT Kanpur. 



Plate Cl .1 G ML classified image of Tj/cknow. 



Plate Cl. 2 ANN classified image of 'Lucknow using 10 hidden nodes. 





ANN Classified (Lucknow) 
training set:train40 




agr 

Ml 

agr2 

■ 

barer) 


forest 

§s 

hcom 

■ 

hres 


mres 

:11 

park 

IS 

nv 

Ml 

scrub 


Plate Cl. 3 ANN classified image of Lucknow using train40 sample set. 



ANN Classified (Lucknow) 

asm approach (7x7 window) 



m 


fii 

m 


Si 


agr 

agr2 

barer* 

forest 

boom 

hres 

mres 

park 

rtv 

scrub 


Plate Cl. 4 jGML classified image of Lucknow using asm (7 x 7) texture approach. 










Plate C2.2 ANN classified image of Bhopal using 10 hidden nodes 



Plate C2.3 ANN classified image of Bhopal using train40 sample set. 








Plate C2.4 GML classified image of Bhopal using asm (7 x 7) texture approach 



ANN Classified (Bhopal) 
window (7x7) based appoach 


barren 

dveg 

Hi d_urban 
■ m_urbart 
■I scrub 
M sveg 
water 1 
water2 
wetland 


Plate C2.5 ANN classified image of Bhopal using window (7 x 7) based texture 

approach. 




